Automated WordNet Construction Using Word Embeddings

نویسندگان

  • Mikhail Khodak
  • Andrej Risteski
  • Christiane Fellbaum
  • Sanjeev Arora
چکیده

We present a fully unsupervised method for automated construction of WordNets based upon recent advances in distributional representations of sentences and word-senses combined with readily available machine translation tools. The approach requires very few linguistic resources and is thus extensible to multiple target languages. To evaluate our method we construct two 600-word test sets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches. Our method exceeds the best language-specific and multi-lingual automated WordNets in F-score for both languages. The databases we construct for French and Russian, both languages without large publicly available manually constructed WordNets, will be publicly released along with the test sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Enhancing Automatic Wordnet Construction Using Word Embeddings

Researchers have shown that a wordnet for a new language, possibly resource-poor, can be constructed automatically by translating wordnets of resource-rich languages. The quality of these constructed wordnets is affected by the quality of the resources used such as dictionaries and translation methods in the construction process. Recent work shows that vector representation of words (word embed...

متن کامل

Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon...

متن کامل

Merging Verb Senses of Hindi WordNet using Word Embeddings

In this paper, we present an approach for merging fine-grained verb senses of Hindi WordNet. Senses are merged based on gloss similarity score. We explore the use of word embeddings for gloss similarity computation and compare with various WordNet based gloss similarity measures. Our results indicate that word embeddings show significant improvement over WordNet based measures. Consequently, we...

متن کامل

Injecting Word Embeddings with Another Language's Resource : An Application of Bilingual Embeddings

Word embeddings learned from text corpus can be improved by injecting knowledge from external resources, while at the same time also specializing them for similarity or relatedness. These knowledge resources (like WordNet, Paraphrase Database) may not exist for all languages. In this work we introduce a method to inject word embeddings of a language with knowledge resource of another language b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017